A multi-level context-dependent prosodic model applied to durational modeling

نویسندگان

  • Nicolas Obin
  • Xavier Rodet
  • Anne Lacheret
چکیده

We present in this article a multi-level prosodic model based on the estimation of prosodic parameters on a set of well defined linguistic units. Different linguistic units are used to represent different scales of prosodic variations (local and global forms) and thus to estimate the linguistic factors that can explain the variations of prosodic parameters independently on each level. This model is applied to the modeling of syllablebased durational parameters on two read speech corpora laboratory and acted speech. Compared to a syllable-based baseline model, the proposed approach improves performance in terms of the temporal organization of the predicted durations (correlation score) and reduces model’s complexity, when showing comparable performance in terms of relative prediction error. Index Terms : speech synthesis, prosody, multi-level model, context-dependent model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling the durational difference of stressed vs. unstressed syllables

Speech production exhibits temporal coherence among speech gestures, and also systematic modulation of durational patterns as a function of the hierarchical level of prosodic structure, e.g., the foot. Intergestural coherence has been understood with reference to dynamic coupling within an ensemble of planning oscillators, and a coupled oscillator model of intergestural timing has been employed...

متن کامل

Improved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition

Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...

متن کامل

Speech rhythm as durational marking of prosodic heads and edges. Evidence from Catalan, English, and Spanish

Data from a total of 24 speakers reading 720 utterances from Catalan, English, and Spanish show that differences in rhythm metrics emerge even when syllable structure and vowel reduction are controlled for in the experimental materials, strongly suggesting that important differences in timing exist in these languages, and thus that the rhythmic percept is not solely dependent on these two phono...

متن کامل

Integration of context-dependent durational knowledge into HMM-based speech recognition

2. DPDF OF STANDARD HMM This paper presents research on integrating context-dependent durational knowledge into HMM-based speech recognition. The first part of the paper presents work on obtaining relations between the parameters of the context-free HMMs and their durational behaviour, in preparation for the context-dependent durational modelling presented in the second part. Duration integrati...

متن کامل

Durational Cues and Prosodic Phrasing in French

Studies addressing prosodic constituency in French generally agree on two levels of phrasing (accentual phrase, AP, and intonation phrase, IP), while the existence of an intermediate level of phrasing (intermediate phrase, ip) is still controversial. In this study we examine durational cues in a read speech corpus at normal and fast rates in which the target syllable was either adjacent to a pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009